Algorithms and Data Design Issues for Basic NLP Tools
نویسنده
چکیده
This chapter presents some of the basic language engineering preprocessing steps (tokenization, part-of-speech tagging, lemmatization, and sentence and word alignment). Tagging is among the most important processing steps and its accuracy significantly influences any further processing. Therefore, tagset design, validation and correction of training data and the various techniques for improving the tagging quality are discussed in detail. Since sentence and word alignment are prerequisite operations for exploiting parallel corpora for a multitude of purposes such as machine translation, bilingual lexicography, import annotation etc., these issues are also explored in detail.
منابع مشابه
اثربخشی آموزش گروهی برنامهریزی عصب زبانشناختی بر میزان امید و کیفیت زندگی کودکان سرطانی
Objectives This study aimed to examine the effect of Neuro-Linguistic Programming (NLP) on the hope and quality of life in children with cancer. Methods The study design is quasi-experimental study with pretest, posttest, follow-up and control group. Study population consisted of children (male and female) with cancer at AminrKabir Hospital and Tabassom Cancer Support Community in 2016 who ap...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملMachine Learning based NLP : Experiences and Supporting Tools
Coprus-based approaches to natural language analysis that utilize recent sophisticated machine learning algorithms have now become to achieve very good performance. In this talk I will overview and categorize machine learning based natural language processing tasks and our experiences of using machine learning to various tasks such as segmentation, POS tagging, phrase and NE chunking, and synta...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملMulti-layer Clustering Topology Design in Densely Deployed Wireless Sensor Network using Evolutionary Algorithms
Due to the resource constraint and dynamic parameters, reducing energy consumption became the most important issues of wireless sensor networks topology design. All proposed hierarchy methods cluster a WSN in different cluster layers in one step of evolutionary algorithm usage with complicated parameters which may lead to reducing efficiency and performance. In fact, in WSNs topology, increasin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009